Simplified amino acid alphabets for protein fold recognition and implications for folding.

نویسندگان

  • L R Murphy
  • A Wallqvist
  • R M Levy
چکیده

Protein design experiments have shown that the use of specific subsets of amino acids can produce foldable proteins. This prompts the question of whether there is a minimal amino acid alphabet which could be used to fold all proteins. In this work we make an analogy between sequence patterns which produce foldable sequences and those which make it possible to detect structural homologs by aligning sequences, and use it to suggest the possible size of such a reduced alphabet. We estimate that reduced alphabets containing 10-12 letters can be used to design foldable sequences for a large number of protein families. This estimate is based on the observation that there is little loss of the information necessary to pick out structural homologs in a clustered protein sequence database when a suitable reduction of the amino acid alphabet from 20 to 10 letters is made, but that this information is rapidly degraded when further reductions in the alphabet are made.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices

MOTIVATION Protein and DNA are generally represented by sequences of letters. In a number of circumstances simplified alphabets (where one or more letters would be represented by the same symbol) have proved their potential utility in several fields of bioinformatics including searching for patterns occurring at an unexpected rate, studying protein folding and finding consensus sequences in mul...

متن کامل

Structural Characteristics of Stable Folding Intermediates of Yeast Iso-1-Cytochrome-c

Cytochrome-c (cyt-c) is an electron transport protein, and it is present throughout the evolution. More than 280 sequences have been reported in the protein sequence database (www.uniprot.org). Though sequentially diverse, cyt-c has essentially retained its tertiary structure or fold. Thus a vast data set of varied sequences with retention of similar structure and fun...

متن کامل

Neutral networks in protein space: a computational study based on knowledge-based potentials of mean force.

BACKGROUND Many protein sequences, often unrelated, adopt similar folds. Sequences folding into the same shape thus form subsets of sequence space. The shape and the connectivity of these sets have implications for protein evolution and de novo design. RESULTS We investigate the topology of these sets for some proteins with known three-dimensional structure using inverse folding techniques. F...

متن کامل

Surveying determinants of protein structure designability across different energy models and amino-acid alphabets: A consensus

A variety of analytical and computational models have been proposed to answer the question of why some protein structures are more ‘‘designable’’ ~i.e., have more sequences folding into them! than others. One class of analytical and statistical-mechanical models has approached the designability problem from a thermodynamic viewpoint. These models highlighted specific structural features importa...

متن کامل

Evaluation of local structure alphabets based on residue burial.

Residue burial, which describes a protein residue's exposure to solvent and neighboring atoms, is key to protein structure prediction, modeling, and analysis. We assessed 21 alphabets representing residue burial, according to their predictability from amino acid sequence, conservation in structural alignments, and utility in one fold-recognition scenario. This follows upon our previous work in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Protein engineering

دوره 13 3  شماره 

صفحات  -

تاریخ انتشار 2000